Incremental Web-Site Boundary Detection Using Random Walks

نویسندگان

  • Ayesh Alshukri
  • Frans Coenen
  • Michele Zito
چکیده

The paper describes variations of the classical k-means clustering algorithm that can be used effectively to address the so called Web-site Boundary Detection (WBD) problem. The suggested advantages offered by these techniques are that they can quickly identify most of the pages belonging to a web-site; and, in the long run, return a solution of comparable (if not better) accuracy than other clustering methods. We analyze our techniques on artificial clones of the web generated using a well-known preferential attachment method. keywords: Web Site Boundary Detection, Random Walk Techniques, Web Site Clustering,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web-Site Boundary Detection Using Incremental RandomWalk Clustering

In this paper we describe a random walk clustering technique to address the Website Boundary Detection (WBD) problem. The technique is fully described and compared with alternative (breadth and depth first) approaches. The reported evaluation demonstrates that the random walk technique produces comparable or better results than those produced by these alternative techniques, while at the same t...

متن کامل

Random Walks in the Quarter Plane Absorbed at the Boundary : Exact and Asymptotic

Nearest neighbor random walks in the quarter plane that are absorbed when reaching the boundary are studied. The cases of positive and zero drift are considered. Absorption probabilities at a given time and at a given site are made explicit. The following asymptotics for these random walks starting from a given point (n0, m0) are computed : that of probabilities of being absorbed at a given sit...

متن کامل

Number of times a site is visited in two-dimensional random walks.

In this paper, formulas are derived to compute the mean number of times a site has been visited in a random walk on a two-dimensional lattice. Asymmetric random walks are considered, with or without drift, for different boundary conditions. It is shown that in case of absorbing boundaries the mean number of visits reaches stationary values over the lattice; comparisons with a Monte Carlo simula...

متن کامل

Saddlepoint Approximations and Nonlinear Boundary Crossing Probabilities of Markov Random Walks by Hock

Saddlepoint approximations are developed for Markov random walks Sn and are used to evaluate the probability that (j − i)g((Sj − Si)/(j − i)) exceeds a threshold value for certain sets of (i, j). The special case g(x) = x reduces to the usual scan statistic in change-point detection problems, and many generalized likelihood ratio detection schemes are also of this form with suitably chosen g. W...

متن کامل

Web-Site Boundary Detection

Defining the boundaries of a web-site, for (say) archiving or information retrieval purposes, is an important but complicated task. In this paper a web-page clustering approach to boundary detection is suggested. The principal issue is feature selection, hampered by the observation that there is no clear understanding of what a web-site is. This paper proposes a definition of a web-site, founde...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011